original sequence
Towards Interpretable and Inference-Optimal COT Reasoning with Sparse Autoencoder-Guided Generation
Zhao, Daniel, Shankarampeta, Abhilash, Hu, Lanxiang, Rosing, Tajana, Zhang, Hao
We propose a novel method that leverages sparse autoencoders (SAEs) and clustering techniques to analyze the internal token representations of large language models (LLMs) and guide generations in mathematical reasoning tasks. Our approach first trains an SAE to generate sparse vector representations for training tokens, then applies k-means clustering to construct a graph where vertices represent token clusters and weighted edges capture sequential token transitions. Using this graph, we define an edge-weight based reward function to quantify adherence to established reasoning traces, thereby identifying exploitative reasoning trajectories. Additionally, we measure generation diversity from clustering to assess the extent of exploration. Our findings indicate that balancing both exploitation and exploration is crucial for achieving high accuracy in mathematical reasoning tasks. During generation, the SAE can serve as a scalable reward model to guide generations, ensuring a balanced trade-off between exploitation and exploration. This prevents extreme behaviors in either direction, ultimately fostering a higher-quality reasoning process in LLMs.
- North America > United States > California > San Diego County > San Diego (0.04)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.87)
Exploring Over-stationarization in Deep Learning-based Bus/Tram Arrival Time Prediction: Analysis and Non-stationary Effect Recovery
Li, Zirui, Yang, Bin, Wang, Meng
Arrival time prediction (ATP) of public transport vehicles is essential in improving passenger experience and supporting traffic management. Deep learning has demonstrated outstanding performance in ATP due to its ability to model non-linear and temporal dynamics. In the multi-step ATP, non-stationary data will degrade the model performance due to the variation in variables' joint distribution along the temporal direction. Previous studies mainly applied normalization to eliminate the non-stationarity in time series, thereby achieving better predictability. However, the normalization may obscure useful characteristics inherent in non-stationarity, which is known as the over-stationarization. In this work, to trade off predictability and non-stationarity, a new approach for multi-step ATP, named non-stationary ATP ( NSATP), is proposed. The method consists of two stages: series stationarization and non-stationarity effect recovery. The first stage aims at improving the predictability. As for the latter, NSATP extends a state-of-the-art method from one-dimensional to two dimensional based models to capture the hidden periodicity in time series and designs a compensation module of over-stationarization by learning scaling and shifting factors from raw data. 125 days' public transport operational data of Dresden is collected for validation. Experimental results show that compared to baseline methods, the proposed NSATP can reduce RMSE, MAE, and MAPE by 2.37%, 1.22%, and 2.26% for trams and by 1.72%, 0.60%, and 1.17% for buses, respectively.
- North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
- North America > United States > North Carolina > Orange County > Chapel Hill (0.04)
- Europe > Germany (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- Transportation > Infrastructure & Services (1.00)
- Transportation > Ground > Road (0.46)
Enhancing Recommendation with Denoising Auxiliary Task
Liu, Pengsheng, Zheng, Linan, Chen, Jiale, Zhang, Guangfa, Xu, Yang, Fang, Jinyun
The historical interaction sequences of users plays a crucial role in training recommender systems that can accurately predict user preferences. However, due to the arbitrariness of user behavior, the presence of noise in these sequences poses a challenge to predicting their next actions in recommender systems. To address this issue, our motivation is based on the observation that training noisy sequences and clean sequences (sequences without noise) with equal weights can impact the performance of the model. We propose a novel self-supervised Auxiliary Task Joint Training (ATJT) method aimed at more accurately reweighting noisy sequences in recommender systems. Specifically, we strategically select subsets from users' original sequences and perform random replacements to generate artificially replaced noisy sequences. Subsequently, we perform joint training on these artificially replaced noisy sequences and the original sequences. Through effective reweighting, we incorporate the training results of the noise recognition model into the recommender model. We evaluate our method on three datasets using a consistent base model. Experimental results demonstrate the effectiveness of introducing self-supervised auxiliary task to enhance the base model's performance.
The English Opening.
The one thing I tell most people upon meeting me is, I play chess. Long before I got into the data science career field I have played chess. It has always been a pastime to me. The reason I mention it so consistently is because those who play chess have a very analytical way of thinking. It is a natural adaption from the game itself, depending on your consistency of play.